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Abstract 
A  superpopulation  modelling  approach  is  used  to  represent  the 
audited  amounts  within  a  population  of  balances  or  transactions.   When 
monetary  errors  represent  overstatements,  an  upper  bound  on  the  expected 
variance  of  the  stratified  difference  estimator  is  derived.   This  result 
is  used  to  stratify  the  population,  determine  an  appropriate  sample 
size,  and  determine  a  decision  rule  for  evaluating  sample  results. 


Stratified  Sampling  Using  a  Stochastic  Model 


Introduction 


According  to  SAS  No.  39,  planning  a  statistical  substantive  test  of 
details  requires  specifying  a  tolerable  monetary  error  and  an  allowable 
risk  of  incorrect  acceptance.   The  tolerable  error  represents  the  maxi- 
mum monetary  error  that  can  exist  in  the  population  without  causing  the 
financial  statements  to  be  materially  misstated.   The  risk  of  incorrect 
acceptance  is  the  risk  of  the  sample's  supporting  the  conclusion  that 
the  total  monetary  error  in  the  population  does  not  exceed  the  toler- 
able error  when,  in  fact,  the  total  monetary  error  does  not  exceed  the 
tolerable  error.   The  auditor  may  also  elect  to  control  the  risk  of 
incorrect  rejection.   This  is  the  risk  of  the  sample's  supporting  the 
conclusion  that  the  total  monetary  error  in  the  population  exceeds  the 
tolerable  error  when,  in  fact,  the  total  monetary  error  is  less  than 
the  tolerable  error. 

Statistically,  the  auditor  may  formulate  this  audit  problem  as  a 
statistical  test  of  hypothesis  (Elliott  and  Rogers  [1972],  Roberts 
[1978]).   This  involves  specifying  an  hypothesis  and  an  alternative. 
One  hypothesis  would  state  that  the  total  monetary  error  in  the  popula- 
tion exceeds  the  tolerable  error,  while  the  alternative  would  state 
that  the  total  monetary  error  in  the  population  is  less  than  the  toler- 
able error. 

To  test  these  hypotheses,  the  auditor  must  specify  the  sample  size, 
how  the  sample  is  to  be  selected,  and  how  the  sample  results  are  to  be 
evaluated.   One  commonly  used  selection  method  is  stratified  random 
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sampling.  This  entails  dividing  the  recorded  amounts  into  several 
strata,  and  choosing  a  random  sample  from  each  stratum.   To  evaluate 
the  results,  a  decision  rule  based  on  an  estimate  of  the  total  monetary 
error  can  be  developed. 

An  exact  statistical  solution  to  this  testing  problem  requires 
knowing  the  sampling  distribution  of  the  estimated  total  monetary  error 
under  the  hypothesis  and  under  the  alternative.   If  the  sampling  dis- 
tribution were  known  as  a  function  of  the  total  monetary  error  in  the 
population,  a  sample  size  and  decision  rule  could  be  determined  that 
would  have  the  allowable  risk  of  incorrect  acceptance  and,  if  desired, 
the  allowable  risk  of  incorrect  rejection  at  some  specified  small  amount 
of  monetary  error. 

Because  the  sampling  distribution  as  a  function  of  the  total  mone- 
tary error  is  unknown,  only  approximate  solutions  are  possible.   The 
currently  used  testing  procedures  that  are  based  on  classical  statisti- 
cal estimators  regard  the  sampling  distribution  as  being  approximately 
a  normal  distribution. 

Even  in  situations  where  the  normal  approximation  is  appropriate,  a 
difficulty  arises  because  the  variability  of  the  sampling  distribution 
(the  standard  error  of  the  estimate)  is  related  to  the  amount  of  error 
in  the  population.   The  fact  that  the  variability  of  the  population  of 
audited  or  error  amounts  changes  as  the  total  monetary  error  is  changed, 
was  observed  by  Duke  [1980 J  and  Duke,  Neter,  and  Leitch  [1982]  in  their 
study  of  power  characteristics.   Their  study  demonstrates  some  of  the 
difficulties  encountered  when  the  auditor  uses  a  procedure  that  does 
not  recognize  this  changing  variability. 
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Is  it  possible  to  say  anything  about  the  variability  of  audited  or 
difference  amounts  relative  to  the  variability  of  recorded  amounts  as 
a  function  of  the  error  population  distribution?  In  this  paper  we 
suggest  this  question  may  be  answered  "yes"  provided  we  are  willing  to 
employ  a  plausible  model  for  the  population  of  audited  or  difference 
amounts. 

The  modelling  technique  entails  regarding  the  audited  amounts  asso- 
ciated with  any  particular  population  of  recorded  amounts  as  being  a 
realization  of  a  particular  type  of  chance  mechanism.   This  technique, 
known  in  the  statistical  literature  as  a  superpopulation  model,  permits 
us  to  derive  relationships  on  an  expected  value  basis. 

The  particular  case  where  all  monetary  errors  represent  overstate- 
ments allows  us  to  determine  upper  and  lower  limits  for  the  expected 
variability  of  audited  and  difference  amounts.   The  upper  limits  are 
then  used  to  provide  an  approximate  solution  to  the  testing  problem. 

The  suggested  procedure  provides  a  method  for  stratifying  the  popu- 
lation, determining  the  sample  size,  and  evaluating  the  sample  results. 
Because  this  procedure  uses  an  upper  bound  based  on  a  worst  case  situa- 
tion, it  is  robust. 

Superpopulation  Model.   Cassel,  Sarndal,  and  Writman  [1977]  observe 
that  many  recent  important  contributions  to  the  problem  of  inference 
in  finite  populations  have  used  the  superpopulation  approach.   In  the 
auditing  context,  taking  a  superpopulation  approach  means  that  the  ob- 
served audited  amounts  are  regarded  as  realized  outcomes  of  a  prescribed 
random  process.   Superpopulation  models  have  long  been  used  in  sampling 
research.   Early  users  include  Cochran  [1939]  and  [1946],  Deming  and 
Stephan  [1941],  and  Madow  and  Madow  [1944]. 
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The  model  for  the  audited  amounts  in  the  population  used  here  is 

(1)       X.  =  (1  -  0.)Y  ,    j  =  1,  ...,  N. 

In  this  model,  the  recorded  amount  (Y . )  is  not  regarded  as  a  random 
variable,  but  the  associated  audited  amount  (X.)  is  the  outcome  of  a 
random  process.  The  random  variable  X.  is  generated  from  the  recorded 
amount  Y.  by  multiplying  by  the  factor  (1  -  0J).   0.  is  a  random  variable 
which  takes  on  the  value  zero  (0)  with  probability  (1-tt),  and  with  proba- 
bility it,  takes  on  a  value  governed  by  a  distribution  function  F. 

Conceptually,  each  recorded  amount  is  accorded  an  equal  chance,  tt  , 
of  being  in  error.   If  a  monetary  error  exists,  the  magnitude  of  the 

error  is  determined  by  the  value  of  0,  which  measures  the  relative  error 

Y.  -  X. 
(0.  =     — aL)  in  the  recorded  amount.  The  relative  errors  are  con- 

sidered  to  be  generated  from  the  same  distribution  function.  When  this 
distribution  is  confined  to  the  interval  from  zero  to  one,  all  the  mone- 
tary errors  are  overstatements.   Negative  values  for  0  would  correspond 
to  understatement  errors. 

This  conceptual  model  reasonably  reflects  the  situation  the  auditor 
faces  in  a  substantive  test  of  details.  The  auditor  knows  the  recorded 
amounts  but  the  associated  audited  amounts  are  unknown.   An  unknown 
fraction  of  the  recorded  amounts  contain  monetary  errors.   Because  the 
auditor  generally  has  no  knowledge  of  which  items  are  in  error,  it  seems 
reasonable  to  suppose  that  each  item  is  equally  likely  to  contain  an 
error.   The  size  of  the  monetary  error  may  be  expressed  relative  to  the 
magnitude  of  the  recorded  amount. 
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Kaplan  [1973A]  used  a  similar  model.   The  chief  difference,  other 
than  notation,  is  that  he  considered  a  second  stage  of  randomization  in 
which  the  audited  amount  was  associated  with  a  recorded  amount  selected 
at  random  from  the  population.   His  expectations  were  taken  relative 
to  both  the  random  selection  process  and  the  error  producing  process. 
Because  we  want  to  examine  the  structure  of  the  audit  population  gener- 
ated from  the  error  producing  process,  the  superpopulation  model  defined 
here  does  not  include  random  selection  as  part  of  the  model. 

Expectation  and  Variance  of  Audited  Amounts.   The  model  may  be  used 
to  derive  the  expectation  and  variance  of  audited  amounts.   Using  the 
symbol  E  to  represent  the  expectation  operator  with  respect  to  the 
random  variable  0,  the  following  relationship  holds: 

(2)  EgX.  J  Y.U-,ye) 

where  y  denotes  the  mean  of  the  distribution  of  relative  errors.   By 
adding  over  all  population  items,  it  follows  that 

N       N 

(3)  E  (EX  )  =  (EY  )(1-ttu  ). 

U  1  J     1  J      U 

This  says  that  the  expected  total  audited  amount  equals  the  total  re- 
corded amount  multiplied  by  the  factor  (1-ttjj-)  ,  or  equivalently,  the 
difference  between  the  total  recorded  amount  and  the  expected  total 
audited  amount  equals  the  total  recorded  amount  multiplied  by  iry   (the 
faction  of  accounts  in  error  times  the  mean  relative  error) .   Because 
of  the  large  size  of  N,  a  realization  of  the  random  process  would  yield 
a  value  of  the  actual  sum  of  audited  amounts  very  close  to  its  expecta- 
tion, or 
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N    ,  N 

EX,  =  SY,(l-7ry  ), 

1  J    1  J     9 

where  =  denotes  approximate  equality. 

The  expected  variance  of  audited  amounts  is 

(4)  E_Var  X  =  [Tra^-Hr(l-Tr)u^+(1-Try  )2]Var  Y  +  ba^U-TOy^T2 

0    ~      0        0      9  Q        0 

Kx.-xr 

1  "3  "     2 

where  Var  X  = = '  ^^  is  the  variance  of  the  relative  error, 

~       N      0 

and  the  symbol  =  denotes  approximate  equality.   The  approximation  arises 

N-l 
from  substituting  one  (1)  for  the  quantity  (   ),  and  consequently  the 

expression  on  the  right  slightly  overstates  the  expected  variance. 

Of  special  interest  is  the  magnitude  of  the  expected  variance  when 
the  total  monetary  error  equals  the  tolerable  error.   The  following  in- 
equality holds  when  all  monetary  errors  represent  overstatement  errors. 

(5)  EQVar  X  <  (l-iTU0)Var  Y  +  ttu0(1-ttuq)Y2 

Using  the  symbol  TE  to  represent  the  tolerable  error,  and  Y  to  represent 
the  total  recorded  amount,  the  following  inequality  is  the  result  of  im- 
posing the  condition  that  the  expected  total  monetary  error  equals 

TEUu  Y  =  TE), 
0 

TF         2 

(6)  EAVar  X  <  (l-TE/Y)Var  Y  +  -^Kl-TE/Y)Y 

0    ~  i 

This  upper  bound  is  realized  when  all  monetary  errors  represent  100 
percent  overstatements. 
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A  lower  bound  on  the  expected  variance  of  audited  amounts  when  the 
total  monetary  error  equals  the  tolerable  error  corresponds  to  the  situa- 
tion where  the  relative  error  is  concentrated  at  a  single  value,  making 

2  TE 

o_  =  0.   This  can  be  seen  by  examining  (4)  and  setting  tru  =  — — .   The 

following  inequality  then  holds: 

TF 
(7)       EQVar  X  >    (l-^Var  * 

This  lower  bound  is  realized  when  each  account  is  overstated  by  a  con- 
stant percentage. 

While  these  results  have  been  derived  without  considering  the  effect 
of  stratifying  the  recorded  amounts,  similar  relationships  hold  when 
the  recorded  amounts  are  stratified  provided  the  probability  of  an  item's 
being  in  error  is  not  affected  by  the  stratification.   Examining  each 
of  the  relationships  (2)-(7),  the  only  change  is  that  all  hold  for  each 
stratum.   To  illustrate  this  (3)  and  (4)  become,  for  the  k   stratum, 

Nk        Nk 

(3?)      EAl   X.,  )  =  (E  Y.,  )(l-iru_) 

0  x  ~jk     1     jk      0 

and 

(4f)      EQVar  ^  =  Tra^Cl-*  )^+(l-TTuQ)2Var  Yk  +  (tto-q+ttU-iOu2^ 

where  the  subscript  k  indicates  the  restriction  to  the  k   stratum. 

Expectation  and  Variance  of  Differences.   Defining  the  difference 
as  the  recorded  amount  minus  the  audited  amount,  analogous  results  may 
be  derived  concerning  the  expected  difference  and  the  expected  variance 
of  the  difference.   From  the  basic  definition  of  the  model,  the  dif- 
ference, D,  may  be  represented  as 
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(8)  D.  =  Y.  -  X. 
~J    J    ~3 

=  0.Y.. 

Taking  the  expectation  with  respect  to  the  random  variable  0,  the 
following  relationship  holds: 

(9)  ED.  =  *UJT,. 

By  adding  overall  population  items,  it  follows  that 

N         N 

(10)  E  (ID.)  =  iry  (EY.) 

i  J        1  1 

The  expected  variance  of  difference  amounts  is 

(11)  EnVar  D  =  ir(y^+a^)Var  Y  +  (y^(l-TT)+ua^)T2 

©    ~      0  0  0        0 

As  anticipated,  this  expected  variance  is  always  smaller  than  the  ex- 
pected variance  of  audited  amounts  whenever  the  expected  audited  amount 
is  at  least  fifty  percent  of  the  recorded  amount. 

When  all  monetary  errors  represent  overstatements  and  the  total 
monetary  error  equals  the  tolerable  error  (TE) ,  an  upper  bound  for  the 
expected  variance  of  differences  is 

(12)  EQVar  D  <  -2.  Var  Y  +  I|<1-I|)Y2 

If  the  distribution  of  monetary  error  with  the  largest  variance  is 
called  the  least  favorable,  then  it  follows  from  (12)  that  when  all 
monetary  errors  represent  overstatements,  the  least  favorable  distribu- 
tion of  monetary  error  selects  a  proportion  of  the  items  in  the  popula- 
tion to  contain  the  error,  and  each  item  selected  is  100%  overstated. 
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This  notion  of  a  least  favorable  distribution  was  introduced  by 
Teitlebaum  [1973]  in  connection  with  dollar  unit  sampling. 

The  inequality  (12)  is  obtained  from  (11)  by  maximizing  the 

2 
variance  of  the  relative  error  (o"_)  subject  to  the  conditions  that 

TE 
ttu  =  — —  and  0  takes  its  values  between  zero  and  one.  More  generally, 

for  any  value  of  tt y  ,  the  inequality  may  be  written 

(13)  EQVar  D  <  7ruQVar  Y  +  7njQ(l-TruQ)¥2 

It  is  also  possible  to  obtain  a  lower  bound  for  the  expected 
variance  of  differences  under  these  same  conditions.   This  relationship 
is  expressed  as 

TF  2 

(14)  E^Var  D  >   (—■;  Var  * 

0    «■     Y 

This  inequality  corresponds  to  the  situation  where  every  population 

TE 
item  is  overstated  by  the  same  relative  monetary  error,  — ~. 

From  (12)  when  all  monetary  errors  represent  overstatements  and 

the  total  monetary  error  equals  the  tolerable  error  it  follows  that  the 

variance  of  recorded  amounts  exceeds  the  expected  variance  of  differences 

whenever  the  square  of  the  coefficient  of  variation  of  recorded  amounts 

TE 
is  greater  than——.   That  is, 

Var  Y   TE 

— 2     Y 
Y 

Because  these  results  apply  to  each  stratum  in  a  stratified  design, 

it  follows  that  as  long  as  the  square  of  the  coefficient  of  variation 

TE 
of  recorded  amounts  within  any  stratum  is  larger  than  — — ,  the  variance 

of  the  stratum  recorded  amounts  exceeds  the  expected  variance  of  dif- 
ferences within  the  stratum. 
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APPLICATION  TO  TESTING.   In  this  section  we  shall  apply  the  approxi- 
mate expected  variances  to  the  problem  of  testing  whether  the  monetary 
error  exceeds  the  tolerable  error  (TE).   We  suppose  that  the  auditor 
expects  some  monetary  error  (EE),  and  that  all  monetary  errors  represent 
overstatements . 

Figure  1  illustrates  the  situation.   On  the  left  is  the  sampling 
distribution  of  the  estimated  monetary  error,  D,  under  the  hypothesis 
that  the  total  monetary  error  equals  EE,  the  expected  error,  and  on  the 
right  is  the  sampling  distribution  of  D  under  the  hypothesis  that  the 
total  monetary  error  equals  TE,  the  tolerable  error.   Note  that  the 
variability  of  the  sampling  distribution  on  the  right  as  measured  by 
its  standard  deviation,  S(TE),  is  larger  than  the  standard  derivation 
on  the  left,  S(EE).   S(TE)  is  the  standard  error  of  the  estimated  dif- 
ference when  the  population  monetary  error  equals  TE;  S(EE)  is  the 
standard  error  of  the  estimated  difference  when  the  population  monetary 
error  equals  EE. 


plG*fc£  I 


TOTAL  MONETARY 
ERROR 
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For  a  specified  risk  of  incorrect  acceptance  (3)  and  a  specified 
risk  of  incorrect  rejection  (a),  the  auditor  must  determine  a  sample 
size  n  and  a  critical  amount  C.   If  the  estimated  difference,  D,  exceeds 
C,  the  auditor  decides  that  the  monetary  error  may  be  larger  than  the 
tolerable  error  (TE)  ,  and  if  the  estimated  difference,  D,  is  less  than 
or  equal  to  C,  the  auditor  decides  that  the  monetary  error  does  not 
exceed  the  tolerable  error. 

The  risk  of  incorrect  acceptance  is  determined  as  the  probability 
that  D  is  less  than  or  equal  to  C  when  the  total  monetary  error  equals 
TE;  the  risk  of  incorrect  rejection  is  determined  as  the  probability 
that  D  exceeds  C  when  the  total  monetary  error  equals  EE.   Using  the 

symbols  z   to  represent  the  normal  table  value  corresponding  to  a  risk 

p 

of  incorrect  acceptance  equal  to  3,  and  z  to  represent  the  normal  table 
value  corresponding  to  a  risk  of  incorrect  rejection  equal  to  a,  the 
equations  for  the  sample  size,  n,  and  the  critical  amount,  C,  are 

(15)  TE  -  EE  =  z  S(EE)  +  z0S(TE) 

a         3 

and 

(16)  C  =  TE  -  z0S(TE) 

p 

Solving  these  equations  for  n  and  C  requires  knowing  the  standard 
derivations  S(EE)  and  S(TE).   To  obtain  an  approximate  solution,  we  can 
use  the  inequality  (13)  developed  in  the  previous  section  for  the  case 
where  all  monetary  errors  represent  overstatements.   Neglecting  strati- 
fication and  the  finite  population  correction  factor  for  the  moment,  the 
following  inequalities  hold, 
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S(EE)  <  /  ±   •  ^|  /  Var  Y  +  (l-^Y2" 


and   S(TE)  <  /  ±  •  -~   /  Var  Y  +  (1-^|)Y2 

Replacing  S(EE)  and  S(TE)  by  these  upper  limits,  we  can  solve  the  fol- 
lowing equation  for  the  sample  size: 

2 

EE  TT   _ .,  f.    EEN— 2     TE  TT   v        ,.    TEN— 2 

Za/  — •  Var  Y  +  (1~)Y  +V  T •  Var  Y  +  (1~Y)Y 

(17)  n  = _ 

(TE  -  EE) 
This  expression  can  be  simplified  and  made  somewhat  larger  by  substitut- 
ing (1-EE/Y)  for  (1-TE/Y)  and  rewriting  as 

2 

EE       TE  2 

Za/  ~Y  +  Z6/  ~Y     Var  Y  +  C1-EE/Y)Y 

(18)  n  « — 21 1 


(TE  -  EE)2 

Now  we  are  ready  to  consider  the  situation  where  stratification  is 
used.   We  suppose  that  the  stratification  is  based  on  the  recorded 
amounts,  by  using  some  acceptable  technique  such  as  the  square-root  of 
the  cumulative  frequencies.   As  in  the  unstratified  case,  the  two  for- 
mulas (15)  and  (16)  are  to  be  solved  to  determine  the  required  sample 
size  and  critical  number. 

An  additional  decision  to  be  made  when  a  stratified  plan  is  used  is 
how  to  allocate  the  sample  is  to  the  strata.   Neyman  allocation,  in  which 
the  sample  is  divided  among  the  strata  in  proportion  to  the  product  of 
the  stratum  population  size  times  the  stratum  standard  deviation,  is  a 
commonly  used  procedure.   Using  this  allocation  method,  the  question  is 
what  standard  derivation  to  use.   One  choice  would  be  to  use  the  stan- 
dard deviation  of  recorded  amounts.   A  better  choice  would  be  to  use  the 
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standard  deviation  associated  with  either  the  expected  error  amount  (EE) 
or  the  tolerable  error  amount  (TE). 

While  neither  of  the  latter  two  is  known,  the  inequality  (13)  pro- 
vides a  useful  upper  limit  for  the  situation  when  the  monetary  errors 
all  represent  overstatements.   Using  the  inequality,  the  two  possible 
allocations  are 

N,  ,  Var  Y(h)  +  (l-^|)Y2(h) 
(19)      n,  =  n 


Z   Nh/  Var  Y(h)  +  (l-£|)Y2(h) 


and 

N,  ,   Var  Y(h)  +  (l-^|)Y2(h) 
(20)      n  =  n    h/  Y 


h     L 

1   N,  ,   Var  Y(h)  +  (l-TE/Y)T^(h) 
1  h/ 


In  these  equations,  h  represents  stratum  h  and  there  a  total  of  L  strata. 

As  a  numerical  example,  we  adapt  an  example  described  in  Roberts 
[1978],  p.  98.   A  population  of  10,000  items  with  a  total  recorded  amount 
of  $4,000,000  is  divided  into  four  strata.   Table  1  gives  the  facts  for 
this  example.   Additionally,  we  assume  that  7E  =  $200,000  (TE/Y  =  .05), 
and  EE  =  $20,000  (EE/Y  =  .005). 


Var  Y(h)  Y, 

n 


STRATUM 

Nh 

1 

5500 

2 

3000 

3 

1000 

4 

500 

6,400  $203.64 

22,500  316.67 

40,000  1050.00 

168,100  1760.00 
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The  following  table  shows  the  allocations  using  each  formula  as  well  as 
the  standard  deviation  of  recorded  amounts. 

PERCENTAGE  OF  SAMPLE 

STRATUM 


RECORDED 

EE  =  $20,000 

TE 

=  $200, 

000 

AMOUNTS 

28.48 

28.48 

34.00 

24.88 

24.92 

34.67 

25.27 

25.24 

15.33 

21.37 

21.36 

16.00 

1 

2 
3 

4 

These  results  illustrate  that  the  allocation  differs  little  between 
using  EE  and  TE,  but  both  of  these  give  a  different  allocation  from 
that  based  on  the  recorded  amounts.   Consequently,  we  shall  use  Neyman 
allocation  calculated  using  the  upper  limit  at  EE.   This  choice  simpli- 
fies the  formulas  for  calculating  the  sample  size  and  critical  number 
without  affecting  the  resulting  allocation  very  much. 

The  following  expression  represents  the  upper  limit  for  the  standard 
error  of  stratified  estimates  D  ,  based  on  L  strata  when  the  monetary 
error  equals  TE: 

TE     L  Nh  -2 

(21)      S(TE)  </-—■/   Z(— -  N,)(Var  Y(h)  +  (1-TE/Y)YZ  (h) ) 

r   Y  v      ,  n,     h 
1   n 

The  corresponding  inequality  for  S(EE)  is  of  the  same  form,  but  with 
EE  replacing  TE. 

The  formula  (15)  can  now  be  used  to  determine  the  sample  size. 
Using  that  formula,  we  replace  the  stratum  sample  sizes  n,  by  the  ex- 
pression (19)  and  use  the  inequalities  for  S(TE)  and  S(EE)  represented 
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by  (21).   After  some  simplification  and  replacing  (1-TE/Y)  by  (1-EE/Y) 
in  the  bound  for  S(TE) ,  the  formula  for  n  can  be  expressed  as 

Cza/^f  +  V1!)2  (E  Nh/  Var  Y(h)  +  Cl-^Y^CH))2 

(22)    n= i j- 

(TE-EE)2  +  (2a/^+  V^")2  E  Nh(Var  Y(h)+(1^f)Y2(h)) 

When  this  formula  is  applied  to  the  previous  numerical  example  with 
a  =  6  =  .05,  the  resulting  sample  size  is  125,  allocated  among  the  four 
strata  as  35,  31,  31,  and  28.   For  comparison,  suppose  the  sample  size 
is  determined  by  using  the  recorded  amounts.   In  that  case,  the  required 
sample  size  is  518.   This  large  difference  is  caused  by  two  factors: 
(1)  in  this  case  the  standard  deviation  of  the  stratum  recorded  amounts 
is  larger  than  the  standard  deviation  of  differences  under  EE,  and, 
except  for  stratum  3,  under  TE,  and  (2)  using  the  recorded  amounts  does 
not  permit  using  the  fact  that  the  standard  deviation  under  EE  is  smaller 
than  under  TE. 

Having  determined  an  appropriate  sample  size  for  a  stratified  design, 
the  critical  amount  C  may  be  obtained  by  using  inequality  (21)  as  a  proxy 
for  S(TE)  in  formula  (16).   The  stratified  difference  estimator,  D  ,  is 
compared  to  the  critical  amount  as  described  earlier  in  the  paper.   The 

critical  amount  C  is  determined  by  the  following  equation: 

2 

L  N 

C  =  TE  -  zQ^/S(—  -  N.)(Var  Y(h)  +  (1  -^f-)Y2(h)) 
3  Y   1  \  h  Y 

Continuing  the  numerical  example,  the  critical  amoung  C  =  $65848.   The 

decision  rule  is  to  decide  that  the  total  monetary  error  exceeds 

$200,000  when  the  stratified  estimator  of  the  monetary  difference 

exceed  $65,848. 
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SUMMARY  AND  CONCLUSIONS.   Adopting  a  superpopulation  approach  to 
modelling  the  distribution  characteristics  of  audited  amounts  provides 
a  useful  basis  for  planning  and  evaluating  stratified  random  samples. 
When  the  monetary  errors  represent  overstatments  we  have  derived  upper 
bounds  to  the  expected  variance  of  the  stratified  difference  estimator. 
Using  this  upper  bound,  it  is  possible  to  stratify  the  population, 
determine  an  appropriate  sample  size,  and  determine  a  decision  rule  for 
evaluating  the  sample  results. 

Using  this  modelling  approach  when  faced  with  overstatemetns  we  are 
able  to  avoid  some  of  the  difficulties  associated  with  the  more  com- 
monly used  stratified  sampling  designs.   The  foremost  of  these  is  the 
problem  of  observing  very  few  monetary  errors  in  the  sample.   Neter  and 
Looibbecke  [1975]  observed  this  in  their  simulation  study.   The  model 
approach  does  not  depend  upon  the  sample  to  provide  an  estimate  of  the 
standard  deviation  of  population  differences,  and  hence  will  perform 
well  regardless  of  the  number  of  errors  observed  in  the  sample. 

Another  difficulty  noted  in  the  literature  is  the  failure  of  the 
standardized  estimator  (defined  as  the  estimator  minus  the  mean  divided 
by  the  standard  error)  to  be  approximately  normally  distributed. 
Kaplan  [1973b]  observed  that  the  correlation  between  the  estimator  and 
the  estimated  standard  error  was  responsible  for  this  failure.   The 
model  approach  developed  here  depends  only  on  the  approximate  normality 
of  the  estimator.   Examining  the  results  of  Neter  and  Loebbecke  [1975] 
for  the  populations  3  and  4,  we  see  evidence  that  the  stratified  dif- 
ference estimator  has  a  distribution  that  is  reasonably  close  to  being 
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normal  when  the  population  error  percentage  is  at  least  five. 
Consequently,  the  use  of  normal  table  factors  should  produce  reason- 
ably good  results. 

Finally,  the  modelling  approach  presented  here  overcomes  some  of 
the  theoretical  difficulties  caused  by  the  fact  that  the  standard  devi- 
ation of  audited  amounts  (or  difference  amounts)  increases  as  the 
amount  of  monetary  increases. 
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